Exploring meta-data of human vaginal microbiome

Group 6

Alberte Englund
Mathilde Due
Line Winther Gormsen
Sigrid Frandsen
Kristine Johansen

STUDY DESCRIPTION

Meta-data from MGnify’s vaginal microbiome genome catalogue

  • Uncover patterns in genome quality, taxonomic composition, and ecological characteristics.

  • Uncover potential patterns for diagnosis of endometriosis via associated pathogens:

    • Anaerococcus, Ureaplasma, Gardnerella, Veillonella, Corynebacterium, Peptoniphilus, Candida albicans, Alloscardovia

DATA CLEANING AND WRANGLING

Untidy –> tidy data

  1. Each variable is saved in its own column.
  2. Each observation is saved in its own row.
  3. Each “type” of observation is stored in a single table.
# A tibble: 618 × 20
   Genome        Genome_type  Length N_contigs    N50 GC_content Completeness
   <chr>         <chr>         <dbl>     <dbl>  <dbl>      <dbl>        <dbl>
 1 MGYG000303700 MAG          678213         2 466332       47.8         63.7
 2 MGYG000303701 MAG         1500176        18 112881       42.4         87.8
 3 MGYG000303702 MAG         1210062        44  48790       26.4         94.8
 4 MGYG000303703 MAG         1706016        27  89653       44.6         93.7
 5 MGYG000303704 MAG          703182         7 111709       47.8         63.7
 6 MGYG000303705 MAG         2542045       112  34925       48           97.9
 7 MGYG000303706 MAG         1449687       185  10153       34.8         85.2
 8 MGYG000303707 MAG         1874692        90  28768       37.1         99.0
 9 MGYG000303708 MAG         1480380        12 169949       42.2         87.6
10 MGYG000303709 MAG          694644        57  15063       47.9         62.0
# ℹ 608 more rows
# ℹ 13 more variables: Contamination <dbl>, rRNA_5S <dbl>, rRNA_16S <dbl>,
#   rRNA_23S <dbl>, tRNAs <dbl>, Genome_accession <chr>, Species_rep <chr>,
#   Lineage <chr>, Sample_accession <chr>, Study_accession <chr>,
#   Country <chr>, Continent <chr>, FTP_download <chr>
# A tibble: 618 × 25
   Genome        Genome_type  Length N_contigs    N50 GC_content Completeness
   <chr>         <chr>         <dbl>     <dbl>  <dbl>      <dbl>        <dbl>
 1 MGYG000303700 MAG          678213         2 466332       47.8         63.7
 2 MGYG000303701 MAG         1500176        18 112881       42.4         87.8
 3 MGYG000303702 MAG         1210062        44  48790       26.4         94.8
 4 MGYG000303703 MAG         1706016        27  89653       44.6         93.7
 5 MGYG000303704 MAG          703182         7 111709       47.8         63.7
 6 MGYG000303705 MAG         2542045       112  34925       48           97.9
 7 MGYG000303706 MAG         1449687       185  10153       34.8         85.2
 8 MGYG000303707 MAG         1874692        90  28768       37.1         99.0
 9 MGYG000303708 MAG         1480380        12 169949       42.2         87.6
10 MGYG000303709 MAG          694644        57  15063       47.9         62.0
# ℹ 608 more rows
# ℹ 18 more variables: Contamination <dbl>, rRNA_5S <dbl>, rRNA_16S <dbl>,
#   rRNA_23S <dbl>, tRNAs <dbl>, Country <chr>, Continent <chr>, Domain <chr>,
#   Phylum <chr>, Class <chr>, Order <chr>, Family <chr>, Genus <chr>,
#   Species <chr>, Completeness_quality <chr>, Contamination_quality <chr>,
#   Overall_quality <chr>, endometriosis_associated <lgl>

DATA DESCRIPTION

Overview of the dataset

  • 618 vaginal metagenome-assembled genomes (MAGs)
  • 25 metadata variables (taxonomy, assembly stats, geography, quality)
  • High completeness and low contamination for most MAGs
  • Dominated by a few major bacterial phyla
  • Genome sizes fall within biologically expected ranges

Figure 1. Genomes per phylum

Figure 2. Completeness distribution

Figure 3. Genome length distribution

ANALYSIS 1

ANALYSIS 2

ANALYSIS 3 – Associated and non-associated-endometriosis MAGs

What we compared

  • GC content

  • Genome length

  • Completeness & contamination

  • Phylum distribution

Key results

  • Associated MAGs cluster in few phyla

  • GC content and genome sizes overlap

  • Assembly quality is similarly high

Conclusion

  • Endometriosis-associated MAGs are not genomically distinct.

Figure x. Phylum proportion of MAGs

Figure x. GC content distribution

ANALYSIS 4

DISCUSSION

    FUTURE PERSPECTIVES